TheSys - A comprehensive thesaurus system for intelligent document analysis and text retrieval

نویسندگان

  • Chin Lu
  • K. H. Lee
  • H. Y. Chen
چکیده

Well designed thesauri can represent seman-tic/conceptual knowledge so as to reveal relationships among diierent elements in documents, thus serving as a critical tool in intelligent text retrieval systems and document analysis systems. In this paper, we present a thesaurus system, referred to as TheSys, which can be used as a tool for users to build thesauri according to their own requirements. It is our goal to design a comprehensive thesaurus building tool which can be used in any eld of specialty rather than targeting for a particular specialty eld. People can use our system to build an electronic thesaurus in any specialty eld required for a speciic application. We propose a thesaurus model, referred to as the thesaurus frame, which uses weighted links, to represent semantic relationships among concepts and terms. Our approach is to use a set of controlled terms, referred to as seman-temes, to build the thesaurus frame. This approach can eeectively reduce the size of the thesaurus yet the intelligence of the thesaurus is not compromised.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه روشی برای استخراج کلمات کلیدی و وزن‌دهی کلمات برای بهبود طبقه‌بندی متون فارسی

Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Large-Scale Linguistic Ontology as a Basis for Text Categorization of Legislative Documents

The paper describes the structure and properties of a large linguistic ontology – a new kind of information retrieval thesaurus Thesaurus on Sociopolitical Life for Conceptual Indexing. The thesaurus is used in various realscale information-retrieval applications in the legal domain. At present one of the main applications of the Thesaurus is knowledge-based text categorization. Categories are ...

متن کامل

Deriving Concepts Hierarchy

Information Retrieval (IR) covers the problems relating to the effective storage, access, searching and locating documents that are relevant for user’s information need or query from large collection documents. Many techniques and tools have been developed to improve these processes. One of these tools is the thesaurus. This paper will present a tool for users to build thesauri according to the...

متن کامل

Study of Ontology or Thesaurus Based Document Clustering and Information Retrieval

Document clustering generates clusters from the whole document collection automatically and is used in many fields, including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most important ones. These characteristics of text data require clustering techniqu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995